Analysis of the Quotation Corpus of the Russian Wiktionary

نویسندگان

  • Alexander V. Smirnov
  • Tatiana Levashova
  • Alexey Karpov
  • Irina S. Kipyatkova
  • Andrey Ronzhin
  • Andrew Krizhanovsky
  • Nataly Krizhanovsky
چکیده

The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different years was built. It was made an attempt to explain the characteristics of the histogram by associating it with the years of the most popular and cited (in the Russian Wiktionary) writers of the nineteenth century. It was found that more than one-third of all the quotations (the example sentences) contained in the Russian Wiktionary are taken by the editors of a Wiktionary entry from the Russian National Corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The comparison of Wiktionary thesauri transformed into the machine-readable format

Institution of the Russian Academy of Sciences St.Petersburg Institute for Informatics and Automation RAS Phone: +7 (812) 328-80-71 Fax: +7 (812) 328-44-50 andrew dot [email protected] http://code.google.com/p/wikokit/ Wiktionary is a unique, peculiar, valuable and original resource for natural language processing (NLP). The paper describes an open-source Wiktionary parser: its architectur...

متن کامل

Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application

The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...

متن کامل

Related terms search based on WordNet / Wiktionary and its application in Ontology Matching

A set of ontology matching algorithms (for finding correspondences between concepts) is based on a thesaurus that provides the source data for the semantic distance calculations. In this wiki era, new resources may spring up and improve this kind of semantic search. In the paper a solution of this task based on Russian Wiktionary is compared to WordNet based algorithms. Metrics are estimated us...

متن کامل

Analysis of the Impact of Economic Sanctions on Health Research and Publication Activities of Scientists from Iran

The article discusses the publication activity of scientists in the field of studying the consequences of US economic sanctions against Iran, and their impact on the development of science and the economy in this countries. The paper considers the dynamics of publication activity in the field of biomedicine of Iranian scientists over the past 20 years. Increased sanctions have led to a shortage...

متن کامل

Transformation of Wiktionary entry structure into tables and relations in a relational database schema

This paper addresses the question of automatic data extraction from the Wiktionary, which is a multilingual and multifunctional dictionary. Wiktionary is a collaborative project working on the same principles as the Wikipedia. The Wiktionary entry is a plain text from the text processing point of view. Wiktionary guidelines prescribe the entry layout and rules, which should be followed by edito...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Research in Computing Science

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2012